Nonparametric Bayesian Learning of Other Agents? Policies in Interactive POMDPs

نویسندگان

  • Alessandro Panella
  • Piotr J. Gmytrasiewicz
چکیده

We consider an autonomous agent facing a partially observable, stochastic, multiagent environment where the unknown policies of other agents are represented as finite state controllers (FSCs). We show how an agent can (i) learn the FSCs of the other agents, and (ii) exploit these models during interactions. To separate the issues of off-line versus on-line learning we consider here an off-line two-phase approach. During the first phase the agent observes as the other player(s) are interacting with the environment (the observations may be imperfect and the learning agent is not taking part in the interaction.) The collected data is used to learn an ensemble of FSCs that explain the behavior of the other agent(s) using a Bayesian non-parametric (BNP) approach. We verify the quality of the learned models during the second phase by allowing the agent to compute its own optimal policy and interact with the observed agent. The optimal policy for the learning agent is obtained by solving an interactive POMDP in which the states are augmented by the other agent(s)’ possible FSCs. The advantage of using the Bayesian nonparametric approach in the first phase is that the complexity (number of nodes) of the learned controllers is not bounded a priori. Our two-phase approach is preliminary and separates the learning using BNP from the complexities of learning on-line while the other agent may be modifying its policy (on-line approach is subject of our future work.) We describe our implementation and results in a multiagent Tiger domain. Our results show that learning improves the agent’s performance, which increases with the amount of data collected during the learning phase.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning and Planning in Multiagent POMDPs Using Finite-State Models of Other Agents

My thesis work provides a new framework for planning in multiagent, stochastic, partially observable domains with little knowledge about other agents. The relevance of the contribution lays in the variety of practical applications this approach can help tackling, given the very generic assumptions about the environment and the other agents. In order to cope with this level of generality, Bayesi...

متن کامل

Learning Others' Intentional Models in Multi-Agent Settings Using Interactive POMDPs

Interactive partially observable Markov decision processes (I-POMDPs) provide a principled framework for planning and acting in a partially observable, stochastic and multiagent environment, extending POMDPs to multi-agent settings by including models of other agents in the state space and forming a hierarchical belief structure. In order to predict other agents’ actions using I-POMDP, we propo...

متن کامل

Learning Without State-Estimation in Partially Observable Markovian Decision Processes

Reinforcement learning RL algorithms pro vide a sound theoretical basis for building learning control architectures for embedded agents Unfortunately all of the theory and much of the practice see Barto et al for an exception of RL is limited to Marko vian decision processes MDPs Many real world decision tasks however are inherently non Markovian i e the state of the environ ment is only incomp...

متن کامل

Efficient Planning for Factored Infinite-Horizon DEC-POMDPs

Decentralized partially observable Markov decision processes (DEC-POMDPs) are used to plan policies for multiple agents that must maximize a joint reward function but do not communicate with each other. The agents act under uncertainty about each other and the environment. This planning task arises in optimization of wireless networks, and other scenarios where communication between agents is r...

متن کامل

A Framework for Optimal Sequential Planning in Multiagent Settings

Introduction Research in autonomous agent planning is gradually moving from single-agent environments to those populated by multiple agents. In single-agent sequential environments, partially observable Markov decision processes (POMDPs) provide a principled approach for planning under uncertainty. They improve on classical planning by not only modeling the inherent non-determinism of the probl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015